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Cells must continuously sense and respond to time-varying environmental stimuli. These signals 
are transmitted and processed by biochemical signalling networks. However, the biochemical reac- 
tions making up these networks are intrinsically noisy, which limits the reliability of intracellular 
signalling. Here we use information theory to characterise the reliability of transmission of time- 
varying signals through elementary biochemical reactions in the presence of noise. We calculate the 
mutual information for both instantaneous measurements and trajectories of biochemical systems 
for a Gaussian model. Our results indicate that the same network can have radically different char- 
acteristics for the transmission of instantaneous signals and trajectories. For trajectories, the ability 
of a network to respond to changes in the input signal is determined by the timing of reaction events, 
and is independent of the correlation time of the output of the network. We also study how reliably 
signals on different time-scales can be transmitted by considering the frequency-dependent coher- 
ence and gain-to-noise ratio. We find that a detector that does not consume the ligand molecule 
upon detection can more reliably transmit slowly varying signals, while an absorbing detector can 
more reliably transmit rapidly varying signals. Furthermore, we find that while one reaction may 
more reliably transmit information than another when considered in isolation, when placed within 
a signalling cascade the relative performance of the two reactions can be reversed. This means 
that optimising signal transmission at a single level of a signalling cascade can reduce signalling 
performance for the cascade as a whole. 



I. INTRODUCTION 

Cells are continually exposed to a wide range of en- 
vironmental signals to which they must react. These 
stimuli are transmitted and processed within the cell by 
networks of proteins and interactions. However, the bio- 
chemical reactions which make up these networks are 
inherently stochastic events. Recent experiments have 
shown that fluctuations associated with this spontaneous 
reaction noise can have significant effects on cell phe- 
notype [IH3]- In signalling networks, the effect of this 
inevitable biochemical noise will be to disrupt the trans- 
mission of signals. Random fluctuations mean that a sin- 
gle input signal can give rise to a distribution of possible 
responses. Conversely, a particular response can be gen- 
erated from a number of input signals. This uncertainty 
compromises the ability of a cell to respond correctly 
to its environment. It is therefore important to under- 
stand how reliably signals can be transmitted through 
signalling networks in the presence of noise. 

A quantitative framework for analysing the reliabil- 
ity of signal communication in the presence of noise is 
provided by information theory Q. The application of 
information theory to neural and sensory signalling has 
a long history [^-Q , with particular focus on the reliable 
encoding of external stimuli in neuronal spiking patterns 
(for example However, the use of these techniques 

in the analysis of intracellular biochemical signalling and 
gene-regulatory networks has, until recently, received less 
attention. In this context a number of studies have con- 
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sidered the reliability of transmission of signals through 

fcific reaction systems in the presence of noise [10l — 
and subject to constraints such as metabolic cost 
[l(|. Additionally the impact of network topology 
on the transmission of constant signals in generic small 
gene regulation and protein signalling motifs has been 
investigated [lill7l42C|. 

Here we analyse the signalling characteristics of a num- 
ber of elementary biochemical reactions for time- varying 
signals. Specifically, the fidelity of information transmis- 
sion between the input and output signals of a biochemi- 
cal network is measured by the mutual information. Most 
previous analyses of the mutual information for simple 
reaction motifs (12hT3, [l7l - l2?j| have considered only the 
response of a network to signals which do not change on 
the time-scale of the network response. For many systems 
the assumption that the signal is constant may not be 
valid. Cells are often exposed to rapidly varying environ- 
ments, to which they should also present a time- varying 
response. Notably, Levine et al [Tl| calculated the mu- 
tual information of an enzymatic push-pull network in 
response to an signal pulse. However, they considered 
only the information about a restricted two-state input 
signal that can be extracted from the instantaneous out- 
put level, and did not take into account the transmission 
of other dynamical properties of the input. Biochemi- 
cal networks often respond not only to the instantaneous 
values of time-varying signals but also to other charac- 
teristics of the stimulus. For example, the chemotaxis 
network of the bacterium E. coli is sensitive to changes 
in the level of chemoattractants in the environment, but 
adapts to constant signals [U [22[ . Another example of 
sensitivity to environmental changes is the osmotic shock 
response in budding yeast, where the cell reacts only to 
changes in osmotic pressure [23|. Cells can also make 
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use of temporal properties of signals to encode informa- 
tion. For example, in calcium signalling it is believed 
that information is encoded in the frequency and dura- 
tion of Ca 2+ bursts, rather than the concentration at any 
point in time [24 |, In rat PC-12 cells, stimulation with 
epidermal growth factor leads to a transient response of 
the MAP-kinase pathway, while neuronal growth factor 
leads to a sustained response 25] . In order to understand 
the function of these signalling systems we must consider 
the transmission of signals which are a function of time, 
trajectories, through the network. 

We have previously discussed the mutual information 
rate between trajectories of biochemical networks [26l |. 
and applied these techniques to small network motifs. 
This analysis can readily be extended to networks with 
an arbitrary number of components and more complex 
network motifs, such as feedback [27| and feedforward 
[28j loops. However, in order to make the calculations an- 
alytically tractable, even in the simplest cases, we must 
make a number of simplifying assumptions. In short, we 
assume that the network of interest can be described by 
a Gaussian model; that is, the distribution of input or 
output signal values at any point in time is Gaussian, as 
is the conditional distribution between any two points. 
For these approximate model systems the mutual infor- 
mation rate can be calculated analytically. For systems 
that do not satisfy these assumptions the Gaussian model 
provides a lower bound on the rate of information trans- 
mission. 

In this paper we calculate the mutual information be- 
tween instantaneous signals and the information rate 
for trajectories within the Gaussian approximation for 
a number of elementary biochemical reactions. We dis- 
cuss in detail the assumptions and validity of this model. 
Our results show that, for signal trajectories, informa- 
tion is encoded in the timing of reaction events in sig- 
nalling cascades. We also compare the performance of 
different reactions in isolation and within a simple lin- 
ear signalling cascade. We find that while a production 
reaction can transmit some signals more reliably than 
an irreversible conversion reaction, when placed within a 
signalling cascade driven by an external upstream signal 
the relative performance of the two reactions is reversed. 
Importantly, this shows that increasing the reliability of 
signal propagation for a single step in a cascade does not 
necessarily improve, and can in fact degrade, signalling 
performance for the cascade as a whole. Our results also 
show that a detection reaction which irreversibly con- 
sumes its signal substrate can allow for more reliable in- 
formation transmission of an upstream signal by reducing 
noise propagation through the signalling network. 



II. FORMULATION: THE GAUSSIAN MODEL 

We aim to quantify the performance of biochemical sig- 
nalling networks by considering how accurately the net- 
work input S can be translated into the network output 



X. We additionally allow S and X to vary over time, so 
the "signal" which the network is required to transmit 
is the trajectory S(t) over some time interval, and the 
output is similarly the trajectory X(t). As a measure 
of signalling performance we use the mutual information 
[J] between the input and output trajectories. Formally, 
the mutual information is defined in terms of probability 
distributions over the possible trajectories, 



I(S,X) = J D[S(t)] J D[X(t)} 



p(S(t),X(t)) log 



p(S(t),X(t)) 



p(S(t))p(X(t)) 



(1) 



However a direct evaluation of Eq. [TJ either analytically 
or numerically, is generally not possible because the space 
of possible trajectories is infinite-dimensional. In order 
to proceed we approximate the dynamics of the system 
by a multivariate Gaussian model, for which the mutual 
information can be calculated exactly. In this section we 
discuss the application of the standard Gaussian com- 
munication channel model Q to time-varying biochem- 
ical networks, and the assumptions and approximations 
which go into this model. 

We take as the input and output signals the deviations 
of S and X from their average values, s(t) — S(t) — (S(t)) 
and x(t) = X(t) — (X(t)), where () represents averaging 
over different realisations of the dynamical system. We 
assume that s(t) and x{t) are jointly-Gaussian processes; 
that is, the joint probability distribution of any two val- 
ues of these processes, p(a(t), (i{t')) for a, (3 = s or x, is 
a bi-variate Gaussian. 

For example, we can construct a vector contain- 
ing the signal values at discrete sample times, s = 
(s(ii), sfe), ■ ■ ■ , s(£jv)) and similarly for x. In general, 
the trajectories s and x can be of different lengths, N s 
and N x . In the Gaussian approximation the joint distri- 
bution of (s, x) is given by 



>(s,x) - (2^)-(^+^)/ 2 |Z|- 1 / 2 exp 



(2) 

The (N s 4- N x ) x [N s + N x ) covariance matrix Z has the 
block form 



^sx ^xx 



(3) 



where C Q /3 is an Np x N a matrix with elements given 
by the correlation function = C a p{tj,ti) = 

(a(tj)/3(ti)). Furthermore, the form of Eq. [2]means that 
the distribution p{a{t)\fi{t')) of any value conditional on 
any other is also Gaussian with variance C a p{t,t'). 

Shannon |3] showed that the entropy of an N— variate 
Gaussian distribution with covariance matrix C is 



tf G = -log [(27re)"|C|] 



(4) 
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From the definition of the mutual information, therefore, 



I(S, X) = H(S)+H(X)-H(S,X) = - log 



( 5 ) 

The problem of calculating the mutual information be- 
tween trajectories in the Gaussian model reduces to cal- 
culating the determinants of the covariance matrices, a 
great simplification over the functional integration over 
ensembles of trajectories in Eq. [TJ In this manuscript 
we shall consider two special cases for which the mutual 
information can readily be evaluated analytically. Specif- 
ically, we shall consider a single instantaneous measure- 
ment of the system, corresponding to N s — N x = 1, 
and infinite continuous trajectories, N s = N x — > oo and 
t; — tj_i — > 0. In this latter case the mutual information 
rate can be expressed in terms of the spectra of eigenval- 
ues of the covariance matrices. 

One of the crucial assumptions of the Gaussian model 
is that the input signal is Gaussian distributed. The va- 
lidity of this assumption for real biological systems is not 
clear, since typical stimulus distributions have not been 
measured in most systems. As noted above, we also as- 
sume that the marginal or transfer probabilities between 
any two signal points is Gaussian. In general, this is not 
exactly true for biochemical systems. For systems that 
do not have Gaussian statistics, Mitra and Stark (29| 
showed that an appropriate Gaussian model provides a 
lower bound on the information rate or channel capacity 
of the non-Gaussian network. For completeness we re- 
produce here the arguments of the proof by Mitra and 
Stark [H: 

1. The channel capacity subject to a power constraint 
on S is defined as C(S, X) — max p ( s ) I(S, X), 
where the maximisation is over all input distribu- 
tions satisfying the constraint. 

2. We can construct a Gaussian input distribution on 
S satisfying the power constraint, pc(s). From 
the definition of the channel capacity, C(5, X) > 
I(Sg,X), since the Gaussian is not necessarily the 
optimal input distribution for the channel. 

3. We can also construct a multivariate Gaussian 
model with the same second moments as the non- 
Gaussian system when the input distribution is cho- 
sen to be pg(s). The mutual information in this 
case I(Sg,Xg) < I(Sg,X). This is essentially be- 
cause a Gaussian distribution has the largest en- 
tropy for a given variance, such that a Gaussian 
transfer function maximises the uncertainty of the 
output for a given input. 

4. In summary, I(S G ,X G ) < I{S G ,X) < C(S,X). 

For a Gaussian system, the mutual information can be 
calculated exactly, and the mutual information equals 
the channel capacity. For systems with a Gaussian input 
distribution but which are otherwise non-Gaussian, the 



mutual information is bounded from below by the mutual 
information for the Gaussian model with the same second 
moments as the non-Gaussian system. For a general non- 
Gaussian system with a power constraint, the mutual 
information calculated in this way is a lower bound on 
the channel capacity. 

Van Kampen's linear noise approximation (LNA) [3(| 
provides a prescription by which we can approximate a 
network of interest by one which satisfies the require- 
ments of the Gaussian model. In this approximation 
we assume that the intrinsic noise in the network is 
Gaussian-distributed and small relative to the mean, and 
we linearise the network response around steady state. 
For linear systems it is known that the second moments 
which are calculated in the LNA are exact |3l|. Thus 



for a linear system the LNA can be used to estimate a 
lower bound on information transmission, which becomes 
exact in the limit of small Gaussian noise. However, for 
non-linear systems we are not guaranteed that the second 
moments calculated in the LNA are the same as those of 
the full non-linear system. Therefore, the LNA does not 
necessarily lead to an appropriate Gaussian model of the 
network in the sense of providing a bound on the informa- 
tion rate. For some non-linear systems the LNA has also 
been found to provide an accurate description of the sec- 
ond moments of networks [l?], H3, HH, [33| , particularly for 
systems with large molecular copy numbers where non- 
linear effects are negligible. Crucially, though, it is not 
necessarily obvious a priori for a given non-linear network 
whether or not the LNA will provide an accurate model. 
Hence if one wishes to consider non-linear networks in 
the LNA, it is important to verify that the second mo- 
ments of the approximate model system match those of 
the full network, for example with stochastic simulations. 
In the remainder of this paper we will focus only on linear 
systems for which the second moments can be calculated 
exactly. 

In the model as described above there is no require- 
ment that the system is in a macroscopic steady state 
flo}. However, henceforth we shall assume that our sys- 
tems are in such a steady-state as this simplifies the cal- 
culation of the correlation functions. Additionally in 
steady state the correlation functions depend only on 
time-differences, C(t,t') — C{t' — t), which restricts the 
form of the covariance matrices and facilitates calcula- 
tion of the determinants. Furthermore, the assumption 
of steady state simplifies the interpretation of the calcu- 
lated information values. 



A. Mutual information between instantaneous 
measurements 

In this section we discuss the mutual information 
between the instantaneous value of the output signal, 
X(to), and the input signal at the same point in time, 
S(to). The instantaneous mutual information tells us, if 
we know the output of the network at a particular time, 
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how much we learn about the current state of the input 
process. We note that this differs from previous analy- 
ses of the mutual information for constant signals \12\ - 
0, Il7l420j , because S remains a dynamic variable which 
changes over time. In particular, we take into account 
the fact that the correlation timescale of the input signal 
and the response time of the processing network are both 
finite. The interplay of changes in the signal and response 
will be particularly important when these timescales are 
comparable. The instantaneous mutual information con- 
siders not only the "intrinsic" noise in X at constant S 
due to the stochastic nature of the production and decay 
reactions, but also the "extrinsic" variability in X which 
arises from changes in the input signal. 

An instantaneous measurement of the input and out- 
put signals represents a special case of the Gaussian 
model in which the input and output vectors are each 
one-dimensional, 



p(s,x) 



1 



2 7 r|Zl 1 /2 



exp 



■-(a x)Z~ 1 



(6) 



and the elements of the matrix Z are the instantaneous 
- 2 - (a(to)jS(to)): 



covariances, o 



a/3 



■S'.T 

•2 



(7) 



Since we allow both the input and output signals to vary 
in time, these covariances include the interplay of the 
input and output signal timescales. From Eq. |5]it follows 
that the conditional distribution of x given s is 



p(x\s) 



(2^| S ) 1/ 



■ exp 



(x-(x\s)f 



2a 2 , 

X\S 



and °l\s = 



where (x\s) = a 2 sx sja\ 
therefore define the network gain g 
trinsic noise a*. — a 2 x 



(8) 



We can 



|Z|/ CT f 

: o'L/o'L and in " 
g 2 a 2 s , and the output signal 

of the network takes the form x = gs + Tj x \ S j where r) x \ s 
is a Gaussian-distributed random variable with variance 
a 2 , . This is the canonical form of the additive white 

x I s 

Gaussian noise channel considered by Shannon and oth- 
ers [1, [13, HH . We note that in general the gain g defined 
above differs from the "macroscopic" gain in the steady 
state values of S and X, |ffl-, which characterises the 
transmission of constant signals. 

From the definition of the mutual information, 



I inst (S,X) = ilog 



2 2 
® ' ss^xx 



IZ! 



1 



log 1 



(9) 



By comparison of Eq. [§] with the well-known result for 
the capacity of a Gaussian channel [J], we can interpret 
this result in terms of signal-to-noise ratio: 



Signal 
Noise 



■s.r 

IZI 



g 

,t2 



(10) 



In essence, to reliably detect an input signal s the network 
gain has to raise the output signal x above the noise level 



The gain-to-noise ratio g ' /<t£\ b 



provides a signal- 



independent measure of the performance of the network. 
For a given input signal, we therefore expect that the 
mutual information is maximised when the ratio g 2 /a 2 , 
is maximised. The gain-to-noise ratio is also the Fisher 
information [35| about the signal s contained in the sam- 
ple x. For a Gaussian system this is the reciprocal of the 
average error in estimating s from a given output x. 



B. Mutual information rate for infinite trajectories 

A second special case of the Gaussian model is the 
limit of infinite, continuous, trajectories. We define the 
entropy rate of a Gaussian process as 

NA ' 

where A is the sampling interval of the signal and T = 
NA is the length of the trajectory. Since we assume 
that the network is fluctuating around steady state, the 
covariance between the input (or output) signal at two 
time points depends only on the time difference between 
the two samples. The corresponding covariance matrix 
(C ss or C X x) therefore has a Toeplitz structure, which 
allows us to rewrite the matrix determinant in the limit 
N — > oo in terms of the power spectrum of the signal 

p(oj) [Mil, 

l i r uo 

h G = — log(27re) + — / dwlogP(w), (12) 



ho = lim 

N^oc 



(11) 



2A 



-in 



where ujq = 7r/A is the angular Nyquist frequency. In 
effect we have decomposed the signal into an infinite 
number of independent frequency components with cor- 
responding variance P(uS). Note that in the limit A — s- 
the entropy rate is not well defined. However, as we will 
see below, the mutual information rate can remain finite 
in this limit. 

To calculate the mutual information we also need the 
joint entropy rate h(S,X). The covariance matrix Z for 
the combined (s, x ) si gnal is not (in general) Toeplitz. 
However, following '■'>($ , the joint entropy rate for a Gaus- 
sian system can be written in terms of the (cross-) power 
spectra of the input and output signals, 



/i(5,X) = ilog(27re) + i- £ 



did 



log[P ss (u)P xx (u) - \P sx (u)\% (13) 

where P a p{uj) is the Fourier transform of C a p{t). Com- 
bining this expression with Eq. [T2] and taking the limit 
A — > 0, the mutual information rate between continuous 
trajectories can be written as 

R(S,X) = h(S) + h(X)-h(S,X) (14) 



1 

4tt 



— / do; log 



P ss {uj)P xx (u>) 



(15) 



5 



We can recognise that the total information rate is the 
sum of independent Gaussian channels at different fre- 
quencies. The output trajectory of the network is, by 
construction, a stationary Gaussian process. For such a 
process it is known that the different Fourier components 
must be statistically independent (34[. In order for this 
to be realised in a signalling system, we require that an 
input signal at a specific frequency leads to an output 
only at a unique frequency; if the network has a response 
at multiple frequencies, the components of the output at 
these frequencies will be correlated. This is equivalent to 
assuming that the network response is linear. Further- 
more, the noise in the network must be purely additive. 
As described above, the LNA provides a prescription for 
constructing a model system which satisfies these require- 
ments. For systems that are non-linear or non-Gaussian, 
Eq. [15]provides a bound on the information rate or chan- 
nel capacity, provided that the power spectra of the full 
non-linear system are used in the calculation of R{S, X). 

How should we interpret the "rate" R(S,X)7 
From Eqs. [TT] and M we see that R(S,X) = 
limr->-oo I(S,X;T)/T, the total mutual information for 
two long trajectories divided by the trajectory length. 
Thus, R(S, X) is the average mutual information per unit 
time for a long trajectory. From this definition it also 
follows that R(S, X) = limT^oo d/( ^ ;T) ; R{S, X) is the 
rate at which we gain information about the input tra- 
jectory as the length of both the input and output trajec- 
tories is increased. However, some care should be taken 
with this interpretation. If we record an output trajec- 
tory of length T, and then measure for an additional time 
T", the total information we have about the input trajec- 
tory as a whole will increase by approximately T' R(S, X). 
However, the additional information we have gained will 
not be restricted to the new segment of the input signal, 
but will also be distributed over the original trajectory. 
Therefore, it is not correct to say that we have learnt 
T'R(S, X) about the segment T < t < T + T of the in- 
put trajectory. More generally, there may be additional 
contributions to the mutual information which are im- 
portant for short trajectories. When T is comparable 
to the correlation times of the network we would expect 
I(S, X- T)/T to deviate from R(S, X). 



The ratio (j)(uS) 



which appears in Eq. 1151 



p ss (w)p xx (w) 

is the coherence between s(t) and x(t). This is a standard 
measure of the correlation between the signals s and x, 
with 4>(co) = for independent signals and 4>(u) = 1 
when s completely determines x. We can also define the 
signal and noise power spectra via analogous expressions 
to those for instantaneous measurements, 



E(w) = g 2 {uo)P ss {u) 



P«(w) 



(16) 
(17) 



We can see that the coherence represents the signal frac- 
tion of the total output power, <f>(u>) = E(w)/[E(cj) + 
N(u>)]. The mutual information rate can also be written 



in terms of the signal-to-noise ratio, 



1 f°° 

R{S, X) = — / duj log 

47r J-oo 



E(w) 



N(u) 



(18) 



recovering the usual expression for the capacity of a con- 
tinuous Gaussian channel [13, [HI . 

From [26| and above we can see that the ability of a net- 
work to transmit information in a time- varying signal at 
a particular frequency is characterised by the frequency- 
dependent gain-to-noise ratio, g 2 (u>) / 'N (w) . To under- 
stand signalling performance it is therefore important 
to consider both the gain and the noise of the network, 
and not simply the gain or noise in isolation. Further- 
more, for systems which satisfy the spectral addition rule 
[32| . meaning that the network dynamics do not affect 
the input signal itself, the gain-to-noise ratio is signal- 
independent and characterises the intrinsic transmission 
characteristics of the network. Knowledge of the gain- 
to-noise ratio also allows us to calculate the optimal 
signal that maximises the information rate for a given 
network, through the "water-filling" approach of Fano 
[34| . Finally, in the same way as for instantaneous mea- 
surements, the gain-to- noise ratio at a specific frequency 
is the Fisher information provided by the output signal 
x(ui) about the input signal s(ui) at this frequency. Thus, 
the gain-to-noise ratio is the reciprocal of our uncertainty 
in estimating the input signal given a particular output. 

We saw previously that the entropy rate of a stochas- 
tic process in the continuum limit is not well defined, 
but the mutual information rate can still be calculated. 
Under what conditions do we find a finite mutual infor- 
mation rate? From Eqs [12] and [15] we can see that the 
mutual information rate will be divergent if the signal-to- 
noise ratio E(w)/iV(w) or coherence 4>{ui) do not approach 
zero as to — > oo. The power spectra of biochemical reac- 
tions typically take the form of rational polynomials in 
co 2 (as we shall see in the following sections). Then the 
integral in the mutual information rate will converge to 
a finite value if the signal power decreases more rapidly 
at high frequencies than the noise power. This charac- 
teristic form of the power spectra also often allows us to 
perform the integral in Eq. [TS] or [T5] and calculate an 
explicit expression for R(S,X). To do this we make use 
of the result that 



dm In 



r w 2 



,2 1 



b 2 



7r(a — b). 



(19) 



Using the properties of the logarithm, this can trivially 
be extended to an arbitrary number of terms, 



duj In 



N o 



n 



2tt 



N 



N 



(20) 



We emphasise that for the integral to be cast in this form 
we require that the coefficient of the leading order term 
in both the denominator and numerator of Eq. [2D] is 
1. Then calculating R(S,X) reduces to finding the roots 
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Motif 



Reactions 



(I) 


As S A0 

n—kW 

S ^ X 


(II) 


A S S A 
sAx xA0 


(III) 


As s A0 
sAs + x xA0 



TABLE I: Summary of the three elementary reaction motifs. 
In motif I we assume that the number of W proteins is suffi- 
ciently large that the rate p can be taken to be constant. 



of polynomials which are constructed from the network 
power spectra. 

Can we still compare the performance of networks with 
a divergent information rate? In these networks the co- 
herence and signal-to-noise ratio approach to some non- 
zero value as cj — > ±oo. Rather than Eq. [2D1 the integral 
in the expression for the mutual information rate will 
have the form 



duj In 



N 9 



uj- 



= 2u>o In k+ 



2loo In 



n 2 



n 



N ( UJ U \ 

Ay ( <n arctan b% arctan — ] . (21) 

f-f V a » h J 



In the limit of A — > and hence ujq — ir/ A — > oo the last 
two terms above can be neglected. Information transmis- 
sion for these networks is therefore dominated by high- 
frequency components, and can be characterised by the 
constant k. The values of k for different networks can 
therefore be used to compare their relative information 
rates. 



Instantaneous mutual information 



Motif I: Reversible 



This motif describes the association and disassociation 
of a signalling molecule, S, and a receptor, W , to form an 
active complex, X. We assume that W is present at high 
copy numbers, such that its depletion due to binding S 
can be neglected. In this case we consider the "signal" we 
wish to detect to be the total number of both bound and 
unbound signalling molecules, St (to) = S(to) + X(t ). 
For this motif the covariances are 

v 2 StSt = (St), cr 2 STX = a 2 xx = {X), (22) 
which gives for the instantaneous mutual information 

WST ,X) = -ilog(l-^)=-ilog(l- « 

(23) 

The instantaneous mutual information between X and 
St is determined simply by the fraction of St which is 
in the bound (X) state on average. Indeed the aver- 
age statistics of this binding reaction are simply those 
of a binomial distribution. Each molecule in the system 
can be in two states: the bound X state with proba- 
bility p/(p + p) = (X)/(St), or the unbound S state 
with probability (S)/(St)- Since each molecule is in- 
dependent, if there are Ns T molecules in the system 
in total, the expected number of X molecules will be 
Ns T (X) I (St) = gNs T , and the variance in the num- 
ber of X molecules will be Ns T g(l — g). Averaging 
over all possible values of St, the intrinsic noise in X is 
°x\st = (^T)g(]-—g)- For instantaneous signals, the time- 
scales of the binding and dissociation reactions are not 
important for information transmission, but only their 
ratio. This is because we wish to estimate only the cur- 
rent state of the system, and we are not concerned with 
how rapidly the state of the system changes. 



2. Motif II: Irreversible modification 



III. RESULTS 

In this section we present results for some elementary 
molecular reactions, considered previously pfjl l32l | and 
summarised in Table HI These reaction schemes are sig- 
nificant because they exemplify the three basic ways in 
which S can directly drive the production of X: reversible 
conversion between S and X; irreversible conversion from 
S to X\ and stimulating production of X without con- 
suming an S molecule. Since these schemes feature only 
first-order reactions the covariances and power spectra 
can be calculated exactly from the chemical master equa- 
tion ju. 



This motif is characterised by the irreversible conver- 
sion of an S molecule to an AT, S — > X. Such a reaction 
could represent irreversible post-transcriptional modifi- 
cation of a protein such as cleavage, or binding of a ligand 
to a receptor followed by rapid endocytosis of the result- 
ing complex. For this motif the covariance a 2 sx — 0. The 
number of X molecules present in the system depends 
only on how many S molecules have been converted to 
X in the past. If the production of different S molecules 
occurs independently, then the instantaneous values of S 
and X are uncorrelated since the S molecules that have 
previously decayed are uncorrelated from those that are 
currently present in the system. Hence the instantaneous 
mutual information between a single measurement of X 
and the simultaneous value of S is zero; measuring X 
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tells us nothing about about how many S molecules are 
currently present in the system. 

Note however that if there are correlations between the 
production of different S molecules, such as if S molecules 
are produced in bursts, then the instantaneous S and X 
values become correlated and the instantaneous mutual 
information will be non-zero. Suppose that S molecules 
are produced instead via the reaction A- nS. Then 
the mutual information between instantaneous values of 
S and X is given by 



Iinst(S, X 

log ( 1 - 



(n - ifpp 



(n + 1) (A + p + p)(2X + 2p + (n + l)p) 



(24) 



Interestingly, for a fixed production rate of X, p, the 
mutual information can be optimised by choosing 



Mopt = \/(A + p)(A + 



1 



p). 



(25) 



while for a fixed degradation rate of X, p, information is 
maximised for 



Popt = (A + p) 



1 



(26) 



These optima are the result of a trade-off between the 
probability that when we observe an X molecule other S 
molecules produced in the same burst are still in the sys- 
tem, and the probability of observing an X molecule at 
all, as depicted in Fig. [TJ For example, if the production 
rate of X is too large, all S molecules rapidly decay to 
X. Since the probability of S molecules from the same 
burst remaining in the system is low, we lose the ability 
to predict the value of S. If p is too small, the chance of 
a single X molecule being produced in a burst becomes 
small and we again lose information, this time about the 
bursts which go undetected. Similar arguments apply 
for p: if p is too small then over the lifetime of X the S 
molecules from the same burst will typically either have 
been degraded or also have decayed to X; if we make p 
too large then we very rarely find an X in the system, 
and hence on average we gain little information about 
S. Note that Eqs [25l and [26l cannot both be satisfied si- 
multaneously - there are no isolated optimal parameter 
combinations. 



(a) 



o 



(b) 



o 



(c) 



s 

Oh 





FIG. 1: Probability of a molecule produced at t = being 
in each molecular state of motif II as a function of time for 
different reaction rates, (a) When p » p, A, S molecules 
rapidly decay to X, but for most of the lifetime of X the 
chance of an S molecule from the same burst being present 
is small. Parameters: p = 0.1, A = 1, p = 5. (b) When 
p S> p,\, S molecules which decay to X are rapidly degraded. 
Therefore the probability of observing an X molecule is small, 
and little information can be gained. Parameters: p = 10, 
A = 0.1, p = 1. (c) Information is maximised when there is 
a significant probability of finding both S and X molecules. 
The figure shows an optimal choice of p for burst size n = 2, 
p = 0.9, A = 0.1: p = 1. 



3. Motif III: Production 



This motif may represent an effective coarse-grained 
model of an enzymatic reaction or of protein production 
in which fast reaction steps have been integrated out. For 
this motif the covariances are: 

°l = (S), oi = gg, o*. = (X) (l + x^) . (27) 



The gain is given by g = p/(X + p), and describes the 
average response of the output signal X to a pertur- 
bation in the input signal. Suppose we initially have 
no S or X molecules present in the system. We then 
introduce a single S molecule at t = 0. The sur- 
vival probability for this S molecule, the probability 
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that it has not yet decayed at time t, will be exponen- 
tially distributed with a mean lifetime of 1/A. If the 
S molecule has not decayed at t, the mean number of 
X molecules present will be X(t) = p[l — exp(— pt)]/p. 
Now we observe the system at a time t > chosen ac- 
cording to the probability that the S molecule is still 
present. This time will then be drawn from the distri- 
bution pit) = exp(— Xt)/ [J °° exp(— Xt)dt] — Aexp(— At), 
and the mean number of X molecules that we will ob- 
serve is J °° p(t)X(t)dt = p/(X + p). We can therefore 
see that the gain directly measures the typical change we 
observe in the number of X molecules if the input signal 
changes by one S molecule. 

The instantaneous mutual information is given by 



I inst (S,X) = --log 



(X + p)(X + p + p) 



(28) 



We see that Ii ns t(S, X) has a maximum as a function of 
p at 



Popt 



(29) 



This optimal value appears as a result of the interplay 
between intrinsic noise and temporal correlations in the 
system. When p 3> A, intrinsic fluctuations in X are 
much more rapid than the systematic changes in X due to 
variations in S. In this case, the instantaneous statistics 
of X are approximately those of a simple Poisson birth- 
death process in response to a constant S input. The 
gain is g — p/(p + X) « p/p = (X)/(S), the number of X 
molecules per S which would be observed for a constant 
S signal. The noise variance in X is approximately that 
of a Poisson process, a 2 ^ s « (X) ~ p" 1 . While increasing 
p decreases the absolute noise strength, the relative noise 
in the mean <J 2 X \ S / (X) 2 increases. Since signalling fidelity 
depends on the ratio g 2 /cr 2 ^ s oc (X) 2 /a 2 ^ s , increasing p 
decreases the gain-to-noise ratio and reduces the trans- 
mitted information. In the opposite limit, when p <C A, 
the output signal X effectively integrates over variations 
in S. In this case the gain is g = p/(p + A) « p/X, 
the mean number of X molecules produced during the 
lifetime of an S molecule. Since the typical lifetime of 
X is long compared to that of S we can assume that 
no X molecule decays before the S from which it was 
produced. Once we are in this regime further decreasing 
p has no effect on our ability to amplify the incoming 
signal, which is instead limited by the lifetime of S; inte- 
grating for a longer time provides no further benefit. The 
intrinsic noise variance, however, is still proportional to 
(X) and hence increases as p is decreased. As a result, 
for small p the gain-to-noise ratio goes as g 2 /<J 2 \ S ~ P- 
The precise value of the optimal decay rate also depends 
on the production rate of X, p, since this determines the 
time-scale of fluctuations in S to which X can respond. 



B. Mutual information rate 

For the reactions shown in Table U the power spectra 
can readily be calculated (3ll . [32j . Some results for these 
motifs were presented in [26| . Here we extend the discus- 
sion of these results. 



1. Motif I 

For reaction motif I we again consider the input signal 
to be st(0 = s{t)+x{t). For this motif, the gain-to-noise 
ratio between the signals sxif) and x(t) is given by 



N(u) 2k [uj 2 + (p + pY + P X] ' 



X(p + X + p) 2 



(30) 



We see that information capacity of the network de- 
creases at high frequencies. This network motif effec- 
tively acts as a low-pass filter for information. The bind- 
ing and unbinding reactions cannot track extremely rapid 
changes in S or X, and therefore high-frequency compo- 
nents of the Sxit) signal are not transmitted to x(t). 

The mutual information rate for this motif can be cal- 
culated by performing the integral in Eq. 1151 and is given 
(in nats per unit time) by 

A r 



R(S T ,X) = - [ v /TT^+ v //3+[/3 + a] 2 -(l + a + /3) 

( 31 ) 

where a = p/X and (3 — p/X are respectively the rates of 
the dissociation and association reactions relative to the 
lifetime of S. The mutual information rate increases with 
increasing (3 and decreases with increasing a. Therefore, 
as for instantaneous measurements, the amount of infor- 
mation about Sx(t) which we can extract from the trace 
x(t) increases as the average fraction of molecules in the 
X state, which is determined by p/p — /3/a, increases. 
However, unlike the instantaneous mutual information, 
the information rate between trajectories does depend 
on the absolute binding and unbinding rates and not just 
their ratio. This is because the mutual information rate 
takes into account the timescale on which the number 
of X molecules is able to track changes in the number 
of S molecules. It should also be noted that changing p 
or p also affect the statistics of the input signal sr(i), 
since molecules are protected from degradation while in 
the X state. This means that the entropy of the input 
distribution will change as p or p are varied. 



2. Motif II 

The coherence for this motif is (/>(ui) — p/i(p + A), 
independent of u). The integral in Eq. [15] is therefore 
divergent, giving an infinite information rate. In real- 
ity, the integral of the mutual information rate should be 
truncated at some large but finite frequency, since the 
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biochemical reactions which we have modelled as instan- 
taneous jump processes actually take some finite time to 
occur. Nevertheless, we can conclude that observing the 
trajectory x(t) provides a large amount of information 
about the trajectory s(t) because for every production 
event of X we know when one S molecule has decayed. 
However, we do not have a complete knowledge of the in- 
put signal s(t) and thus the coherence remains less than 
1. We can see that for A 3> p, 4> — > 0. Since the majority 
of S molecules are degraded directly, and do not decay via 
the X form, most of the molecules which pass through the 
system are never observed in the output signal. Hence 
the fraction of the input signal which we measure will be 
very small. Conversely, when p > A, ^ « 1/4. In this 
case we observe the decay of all S molecules, but we still 
have some uncertainty about their production times. 

We note also that signalling fidelity, as determined by 
either the coherence or signal-to-noise ratio, is indepen- 
dent of the decay rate of X, p. Since decay events of 
X occur independently of the number of S molecules 
present, they contribute no information about the input 
signal. We can understand this by considering the tim- 
ing of production and decay events in the trajectory of 
X. Indeed, the trajectory of the number of X molecules 
as a function of time will consist of discrete steps at 
which X molecules arc produced and decay. Apart from 
a constant offset which can be absorbed into (X), the 
trajectory x(t) is completely described by the sequence 
of times at which the number of X molecules increase 
and decrease, {t+} and {£-} respectively. We can there- 
fore write the probability of a particular trajectory as 
p(x(t)) = p({t + },{t-})' = p({t+})p({t-}\{t + }). The 
production of X molecules is regulated by the signal S, 
but their decay is not. That is, the timing of produc- 
tion events is dependent on the input s(i), but the tim- 
ing of decay events is determined solely by the intrin- 
sic dynamics of X and therefore does not depend on S 
explicitly. In this case, we can also factorise the con- 
ditional probability of a given trajectory p(x(t)\s(t)) = 
p({t + }\s(t))p({t-}\{t + }) . The mutual information can 
then be written as 



I(s(t),x(t)) 
p(«(t),{t+},{t_})lo 



I D[s(t)] J D{t + } J D{t_] 

p(s(t))p({t + }\s(t))p({t.}\{t + }) 



p( s (t))p({t + })p({t.}\{t + }) 



The discussion above assumes that individual produc- 
tion and decay steps can be resolved and that the tim- 
ing of all events is known exactly, and thus is only valid 
if we can observe the continuous trajectory x(t) on all 
timescales. If on the other hand we only have a discrete 
sampling x = (x(t\) , xfo) , ■ ■ ■ , x(tjv)), then the degrada- 
tion reactions of X increase our uncertainty about S. If 
we observe that the number of X molecules has changed 
by n = x(ti) — x(ti-\) over the interval [U-i, U], then we 
can conclude that the number of production events in the 
corresponding interval minus the number of degradation 
events must equal n. However, we do not know exactly 
how many production and decay events have taken place, 
and therefore the accuracy with which we can estimate 
the input trajectory s(t) is reduced. 



3. Motif III 



where D{t} represent integration over all possible se- 
quences of event times. The argument of the logarithm 
is independent of {t-}, and hence this integral can be 
performed trivially. We are left with exactly the mutual 
information between s(t) and {£+}, 



I(s(t),x(t))=I(s(t), {*+}). 



(33) 



This result shows that the information about S which 
can be extracted from x(t) is contained specifically in 
the timing of X production events. 



For this motif the gain-to-noise ratio is 
5 2 H p 



N(uj) 2(5)' 



(34) 



independent of uj. This motif is therefore able to transmit 
signals at all frequencies equally well. However, we note 
that both the gain and noise, 



5 2 M 
N{uj) 



P 



uj 2 + /J 2 

MS) 

UJ 2 + jl 2 



(35) 
(36) 



decrease at frequencies larger than p. Both the input sig- 
nal and the intrinsic noise in X are effectively integrated 
over the lifetime of X molecules, 1/p, and so are atten- 
uated at high frequencies. It should be noted that while 
information can be reliably encoded at high frequencies 
in the signal x(t), the power associated with these varia- 
tions, g 2 (oj)P ss (uj), may be small. Therefore, if this sig- 
nal is taken as the input to another downstream process, 
these signals may be difficult to decode. For example, 
intrinsic noise in the detection of X by the downstream 
network may overwhelm the small amplitude signal at 
high frequencies (26[. 

The coherence for this motif is 



(32) 



p\ 



0J 2 + X(X + p)' 



(37) 



showing that the information content of the output signal 
decreases at high frequencies. This is because the coher- 
ence depends on the input power spectrum P ss (uj), which 
itself scales as u~ 2 for u> A; the information content 
of the input signal is itself reduced at high frequencies. 
The mutual information rate (in nats per unit time) for 
this motif is 



R(S,X) 



X 



y/l + P/A - 1 



(38) 
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As we saw for motif II, both the coherence and informa- 
tion rate are independent of fj,, the decay rate of A. As 
discussed above, since decay events of X occur indepen- 
dently of the input signal S, the information we can gain 
about the input signal is the information encoded in the 
production of X. 

We may therefore be tempted to conclude that the 
decay events of X contain no information about the input 
signal s(t). This is not entirely true. The information 
encoded in the decay events of X can be quantified by 
considering a modified network combining motifs II and 
III as follows: 



S AS- 



i A s, s A 0, 

X, X A Y, Y 



(39) 



As we saw in the previous section, in motif II information 
about the input signal is encoded in the timing of X pro- 
duction reactions. Similarly, in the set of reactions in Eq. 
[39] information about S will be encoded in the timing of 
Y production reactions. Since the reactions producing Y 
correspond to decays of A, the mutual information be- 
tween s(t) and y(t) will also be the mutual information 
between s(t) and the decays of A. In the limit p, — > oo, 
this set of reactions reduces to motif III: as A molecules 
decay immediately the two reactions S A S + A and 
A A Y are effectively combined into SAS + 7. How- 
ever, for finite p we find that the mutual information rate 
between S and Y, 



sfl + 2av/rT^ + a 2 - (1 + a) 



(40) 



where a = fi/X and f3 — p/A, is reduced compared to Eq. 
l38l Additionally, the gain-to-noise ratio and coherence, 



g 2 M = p p 2 

N{lu) 2(S)lj 2 +p 2 ' 

At \ 



p 2 pA + (lo 2 + \ 2 ){lo 2 + p 2 ) ' 



(41) 
(42) 



are reduced at all frequencies compared to Eq. [34] and 
Eq. [37] Therefore we can see that the decay events of 
A do provide some information about the trajectory s(t), 
but less than can be obtained from observing the produc- 
tion of A. Since decays of A take place independently of 
S, from observing the decay of A we can only estimate, 
based on the lifetime of A, a distribution of times in the 
past when an S molecule was present. However, observ- 
ing the production of an A molecule at a given time t tells 
us directly that at least one S molecule is present. Wc 
conclude that in the case where we can observe the entire 
trajectory of A, decay events provide us with no addi- 
tional information which we could not already extract 
from the production events contained in the trajectory. 



4- Comparison of different motifs 

It is important to realise that in motifs I and II consid- 
ered above the dynamics of the detection reaction affects 
the statistics of the signal. The gain-to-noise ratios for 
these motifs are therefore not intrinsic network proper- 
ties, but also depend on the input process. This can be 
seen from the appearance of the degradation rate for the 
input species, A, in the expressions for the gain-to-noise 
ratio. In general, therefore, the ensemble of input signals 
will be different for each motif, making it problematic to 
directly compare the different information rates. How- 
ever, we can make a useful comparison in cases in which 
the input signals have the same form. 

First we shall examine motifs II and III, and consider 
the special case where the statistics of s(t) are the same 
for both reactions. To achieve this we choose in motif II 
A = 0, and in motif III A = p. Both motifs can then be 
described by the same macroscopic evolution equations, 



ds 
~dt 

dx 



k - ps(t) + r) s (t) (43a) 
ps{t)-px(t)+r) x (t). (43b) 



However, the noise correlations will be different in the 
two cases: for motif II we have (ris{t)T)x(t')) = ~p(S)${t~~ 
t'), while for motif III (r) s (t)Vx(t')) — 0. Calculating the 
coherence of these two systems gives 



i 



2 + L0 2 / P 2 



(44) 
(45) 



Since the input power spectra P S8 (ui) are the same for 
these two networks, we can use the coherence directly 
as a comparison of signalling performance for signals of 
different frequencies. We see that at low frequencies, cj < 
V2p, motif III provides more information, while for high 
frequencies, u) > y2p, motif II allows for more reliable 
signal transmission. 

To understand these differing responses of A to signals 
at different timescales we consider the information that 
can be extracted from the trace x(t) about S molecules 
with different lifetimes, as depicted in Fig. [5] The spon- 
taneous decay reaction we have assumed for S means 
that the lifetimes of different S molecules will be expo- 
nentially distributed with mean 1/p. Recall also that 
the information about the signal s(t) is contained in the 
timing of the production reactions for A. Since we have 
taken A = in motif II, we know that the decay of every 
S molecule corresponds to the production of an A. Wc 
therefore obtain the same amount of information about 
the production and decay of each S molecule, regardless 
of its lifetime. We learn exactly the time at which it de- 
cays. In addition we can estimate, based on the typical 
lifetime of an S molecule, the time at which the molecule 
was produced. However, for S molecules with a lifetime 
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Motif 



g 2 (co)/N(co) 



■'■ 2{Q) 



^ + A(A+p) 



(la) 



ap 
2{Q) 



A(A+p+ M ) 

(A+p)i^ + A(A+p)(A+p+p) 



(II) 



ap 
2(Q) 



A+p 
^ + (A+p)^ 



(III) 



^ + A(A+p) 



TABLE II: The gain-to-noise ratio of the reaction motifs in 
Table U when driven by an upstream signal Q via Q A Q + 
S. The row labelled la corresponds to motif I but with the 
additional reaction X A 0. 



g 1 t 

-a 

S i _ 

ft 1 

x in 



FIG. 2: Comparison of X production in motifs II and III 
for 5* molecules with different lifetimes. In each panel the 
upper plot shows an example of the birth and death of S 
molecules. The second shows the corresponding timing of X 
production in motif II. The lower trace shows an example of 
X production events in motif III. (a) For some short-lived S 
molecules no X will be produced in motif III. We therefore 
lose information about these signals, in contrast to motif II 
for which all S molecules are detected, (b) For long-lived 
S molecules motif III allows for a more accurate estimate of 
the production time, since more than one X molecule can 
be produced. This increases the total information about s(t) 
compared to motif II. 



much longer or shorter than the average this estimate 
will be inaccurate. 

In motif III, on the other hand, X molecules will 
be produced at an average rate p for each S molecule 
present. Thus for S molecules which decay extremely 
rapidly with a short lifetime r <C 1/p, the probability of 
producing an X molecule will be small. We effectively 
do not detect these S molecules at all, and hence gain no 
information about their contribution to the trajectory 
s(t). This is in marked contrast to motif II, for which 
we know that we will detect all S molecules. On the 
other hand, for S molecules with a much longer lifetime 
t l/p, more than one X molecule will be produced on 
average. From this sequence of production events we can 
estimate both the production and decay times of the cor- 
responding S. Importantly, while we do not get as much 
information about the decay time as we would in motif II, 
our estimate of the production time will be more accurate 
than motif II allows. The total information gained about 
both the production and decay times may therefore be 
higher. On average motif III provides more information 
than motif II about S molecules with a longer than av- 



erage lifetime, but less information about S molecules 
with a short lifetime. At a more macroscopic level, motif 
III effectively amplifies slowly-varying signals, producing 
more than one X molecule for each S, but averages over 
extremely rapid changes in S, producing less than one 
X per S. Motif II transmits all signals with a similar 
amplitude, regardless of the timescale of the input. 

Another way to construct a system in which the re- 
actions under comparison do not affect the input signal 
is to place the input signal upstream of S, and to use 
this uncorrelated process to drive the reactions of inter- 
est. We therefore add to each motif in Table Q] a sig- 
nal Q, which drives the production of S via the reaction 
Q A Q + S. This choice ensures that the input sig- 
nal, now q(t), is uncorrelated from the noise within the 
downstream signalling network, and hence that the input 
power spectrum P qq (u)) is unchanged by the network dy- 
namics. Then the fidelity of signal transmission between 
q(t) and the output x{t) can be quantified by the gain- 
to-noise ratio, g 2 (uj) / N (w) . The gain-to-noise ratios of 
these modified reaction motifs are shown in Table [TTJ 

The expressions in Table |TT] have a number of inter- 
esting features. Firstly, we note that for equal reaction 
rates the gain-to-noise ratios of motifs I and III are iden- 
tical. In both motifs I and III we can detect the same 
S molecule many times, either by repeated switching be- 
tween the S and X forms or because one S can stimulate 
the production of several X molecules. For these two 
motifs, with identical rates, one can straightforwardly 
show that the distribution of the number of times a given 
molecule converts from S to X and back in motif I is iden- 
tical to the distribution of the number of X molecules 
produced from a single S molecule in motif III. Thus the 
strength of this "interference" between repeated detec- 
tions of the same S molecule is the same in the two mo- 
tifs; from the point of view of information transmission, 
the two reactions are then equivalent. 

In the limit uj — > 0, all three expressions tend to the 
same value. All three motifs perform equally well for 
the transmission of slowly-varying signals. For uj > 
the gain-to-noise ratio for motif II becomes larger than 
that of either motif I or III, showing that motif II is able 
to transmit more information. Comparing motifs II and 
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III, at low frequencies the gain is lower in motif II as we 
found above when considering signal transmission from 
S to X. However, the noise is also lower in motif II at all 
frequencies, because intrinsic noise in the production of 
S does not propagate to X (see Appendix [A] for further 
details). Comparing motifs I and II, the situation is more 
complex. At low frequencies the gain is smaller in motif 
II; however, the noise is once again also less in motif 

II, making these motifs perform equally well for slowly- 
varying signals. At frequencies w 2 > p(X + p/2) the gain 
is larger for motif II, as the switching reaction S # X 
is less able to track rapid changes in S. At intermediate 
frequencies it is possible for motif I to have a lower noise 
power than motif II. However, in this regime the gain in 
motif I is significantly smaller than that of motif II, and 
so motif II is still able to transmit more information at 
these frequencies. 

We can also consider a modified version of motif I in 
which X molecules can also degrade spontaneously via 

the same reaction as S, i.e. X A 0. We denote this set 
of reactions by motif la, and the corresponding gain-to- 
noise ratio is shown in Table HU The probability of an 
X molecule switching back to the S form depends on the 
relative rates of the two possible decay reactions for X, 

X A S and A A 0, and is given by pj (p + A). In the 
limit p — > 0, this motif reduces to motif II: X molecules 
never switch back to S but instead always decay. In 
the limit p — > oo, X molecules always return to the S 
state, and hence we recover motif I. By varying p we 
therefore change the number of times the same molecule 
switches between the S and X states. We would therefore 
expect the gain-to-noise ratio for this motif to increase 
with decreasing p. We can see from Table |H] that for 
any finite non-zero value of p the gain-to-noise of motif 
la is indeed larger at all frequencies than that of motif 
I, but not as large as that of motif II. As p is decreased 
the gain-to-noise ratio for motif la interpolates smoothly 
between those of motifs I and II, as shown in Fig. |3] 

Finally, we examine motifs II and III with the combi- 
nation of both an upstream signal Q and the parameter 
choices of Eq. |43j A = in motif II and A = p in motif 

III. Specifically, we consider two networks described by 



ds 

— = aq(t) - ps(t) + r) a (t) (46a) 
dx 

— = ps(t)-px(t)+r) x (t), (46b) 



where as before {%(t)T] x (t')) = —p(S)5(t — t') or 
(r] s {t)rj x (t')) = for motifs II and III respectively. For 
these parameter choices the power spectra of Q and S 
are the same in both networks. 

The gain-to-noise ratios between the input signal q(t) 
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FIG. 3: Gain-to-noise ratio as a function of frequency between 
q(t) and x(t) for different motifs. The gain-to- noise ratio is 
highest for motif II, and lowest for motifs I and III. As p is 
varied in motif la, the gain-to- noise ratio interpolates between 
these two extremes. In each case p = 10A was chosen. 



and output x(t) for the two networks are 
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(47) 
(48) 



Interestingly we now find that motif II can transmit low- 
frequency signals {uj <C p) more reliably than motif III, 
while for high frequency signals (u 3> p) the two reactions 
perform equally well. This is in contrast to Eqs 02] and 
1431 which show that motif III is able to more reliably 
transmit information about s(t) at low frequencies while 
motif II is more reliable at high frequencies (see Fig. [4j . 

To understand how the fidelity of signal transmission 
changes for different input signals we must consider the 
different sources and propagation of noise in these net- 
works (see Appendix [A] for more details) . If the input 
signal is taken to be s(t) then any fluctuations in 5, re- 
gardless of their origin, contribute to the input signal. 
However, when we are concerned with the transmission 
of q(t), fluctuations in s(i) that are uncorrelated from 
q(t) are considered noise. We wish to transmit only those 
changes in S that are caused by Q. For the two motifs 
considered here the mean response to an input signal q{t), 
measured by the gain between q(t) and x(t), is the same. 
However, the propagation of intrinsic noise in S differs 
between the two motifs. We saw previously that motif 
III is able to amplify fluctuations in S at low frequencies 
(cj <C p). The fluctuations which made up the signal s(t) 
in the network described by Eq. 33] now correspond in 
Eq. 0(5] to intrinsic noise, independent of q(t), in the pro- 
duction and decay of S. If the input signal of interest is 
s(t) then amplification of these fluctuations is beneficial, 
as we saw in Fig. [5J as it allows for better resolution 
of different s(t) signals. However, if the input signal is 
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FIG. 4: Comparison of the performance of motifs II and III 
for the transmission of different input signals, (a) The gain- 
to-noise for the transmission of s(t) to x(t) for the basic motifs 
with parameters chosen according to Eq. 1431 A = in motif 
II and A = p in motif III. (b) Gain-to-noise ratio for the trans- 
mission of q(t) to x(t), as in Eq. [46]with the same parameter 
choice. 



q(t), then amplifying intrinsic fluctuations in S, which 
are uncorrelated from Q, provides no information about 
q(t); indeed this noise obscures the desired signal. Mo- 
tif II, meanwhile, does not suffer from this problem as it 
cannot amplify intrinsic fluctuations in S; indeed, noise 
in the production of S is not propagated to X. Conse- 
quently at low frequencies the noise in the transmission 
of q(t) is larger in motif III than in motif II, reducing 
the fidelity of transmission of these signals. For high fre- 
quency (w 3> p) signals we saw previously that motif III 
is unable to track changes in S, and hence the transmit- 
ted noise decreases. The total noise power for motif III 
is then dominated by intrinsic noise in the production 
and decay of X, which is the same as that found in motif 
II. Therefore, the gain-to-noise ratios of the two motifs 
become the same for high-frequency signals. 



IV. DISCUSSION 

In this paper we have considered the transmission of 
time-varying signals through elementary biochemical re- 
actions. We have considered the transmission of both 
instantaneous signals and of complete trajectories. Our 
results show that these reactions can have radically dif- 
ferent information characteristics for these two types of 
signals. Most strikingly, for the irreversible modifica- 
tion reaction in motif II the instantaneous information 
is zero, yet for trajectories we find an extremely large 
mutual information rate. Additionally for reaction mo- 
tif III the mutual information rate is independent of the 
decay rate of the output component, but the instanta- 
neous mutual information does depend on this parame- 
ter. These two information quantities therefore measure 
different aspects of the signalling behaviour of networks, 
both of which may be important in different biochemical 
systems. 

The reliability with which a network can transmit a 
particular frequency component of the input signal tra- 
jectory is determined by the gain-to-noise ratio of the net- 
work as a function of frequency. For systems that obey 
the spectral addition rule [Hj], that is those for which 
downstream reactions do not affect the input signal, the 
gain-to-noise ratio is an intrinsic property of the process- 
ing network. For networks that do not obey the spectral 
addition rule the gain-to-noise ratio will be dependent on 
the statistics of the input signal. The mutual information 
between input and output signals, which quantifies the 
information which can be transmitted about a particular 
input ensemble, also depends on the particular choice of 
the input signal. Thus when comparing the mutual in- 
formation in different motifs, or the gain-to-noise ratio 
of motifs for which the statistics of the input signal are 
affected by the network, care should be taken to ensure 
that the input distributions of the different networks are 
the same. 

Recently Endres and Wingreen showed [37j that an 
absorbing detector is able to more accurately measure a 
steady-state ligand concentration than either a detector 
that allows for passive observation of the local concentra- 
tion or a detector to which a ligand can bind reversibly. 
These different situations are analogous to our motifs II, 
III and I respectively. It has also been observed previ- 
ously that the irreversible conversion reaction S — t X re- 
duces the propagation of noise through networks [H], [38| . 
Our results in section IIII B 41 show that the optimal re- 
action to transmit time-varying signals depends on the 
input signal that the network is attempting to transmit. 
If the signal is the concentration of ligand itself, as in 
Fig. 0Ja), then an absorbing detector is beneficial only 
for rapidly- varying ligand signals, since this reaction does 
not allow for amplification of slowly- varying signals. To 
detect low-frequency changes in the ligand signal it is 
better to respond via a reaction that does not consume 
the ligand molecule. On the other hand, if the ligand is 
itself a reporter for another upstream process then the 
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production reaction of motif III amplifies intrinsic noise 
in the signalling network at low frequencies, obscuring 
the signal of interest. In this case, as we saw in Figs. 
[3] and HJb), the use of an absorbing detector is always 
preferable. 

The qualitative difference in the relative performance 
of motifs II and III for different input signals, shown in 
Fig. 21 has important consequences for the design of sig- 
nalling cascades. Intuitively we might expect that select- 
ing the reaction for each level of the cascade which pro- 
vides the most information about the dynamics at the 
previous level would optimise information transmission 
for the entire cascade. However, Fig. [4] shows that this 
is not the case. In fact, choosing the reaction which opti- 
mises the reliability of signalling at one downstream step 
can reduce the overall information transmitted through 
the cascade if more intrinsic noise is propagated. Our 
results suggest that even in the absence of feedback it is 
vital to consider the transmission properties of signalling 
cascades as a whole, rather than isolating individual reac- 
tion steps. Naturally-occurring signalling cascades may 
consist of a number of steps which are individually sub- 
optimal in terms of one component tracking fluctuations 
in another, but which together optimise the transmission 
of the signal of interest while minimising the impact of 
noise within the network. 

We have observed that the information rates for reac- 
tion motifs II and III are independent of the decay rate 
of the output component X. This decay rate sets the 
relaxation time for intrinsic fluctuations in X and is of- 
ten considered to set the timescale on which X is able 
to respond to signals. However, in terms of information 
transmission this is not the case. Instead of the dissi- 
pative timescale, the ability of X to respond to changes 
in the input signal is determined by the rate of produc- 
tion events. Importantly, the mutual information is inde- 
pendent of the decay rate of only the output component 
of a signalling pathway. Information transmission does 
depend on the relaxation rate of intermediate signalling 
components. Thus if X is taken as the input to another 
process then the overall information rate will depend on 
its relaxation rate, as we saw when considering the de- 
cay of X to Y in Section IIII B 31 More generally, our 
results indicate that signals in biochemical networks can 
be encoded in the timing of specific reaction events. To 
understand whether cells can exploit this information it 
will be important to investigate situations in which dif- 
ferent encoding strategies are employed in vivo, and to 
understand the ability of cells to decode this information. 

The mutual information between trajectories, as we 
have calculated here, is a measure of how reliably sig- 
nals can be transduced through networks. However, it 
is known that many biochemical networks also perform 
other signal-processing functions such as filtering high- 



frequency [39| or low-frequency input signals [2l|, [23| . 
While these response characteristics of networks appear 
in the gain-to-noise ratio, as we have shown here and 
previously (26l - [28| . the mutual information does not dis- 
tinguish between the properties of the input signal that 
the cell wishes to respond to and those with which the 
cell is not concerned. In these cases a more biologically- 
meaningful measure of the performance of the network 
would be the mutual information between the properties 
of interest in the input signal and the output trajectory. 
For example, if a cell wishes to decode the frequency of an 
oscillating input signal into the amplitude of a messenger 
signal, but is not concerned with the amplitude or phase, 
a more appropriate measure of signalling would be the 
mutual information between the input signal frequency 
and the output signal amplitude. The appropriate input 
and output signals must be considered on a case-by-case 
basis, and relies on our understanding of the biological 
function of the particular system being considered. We 
hope to address these issues in more detail in future work. 

Throughout this paper we have calculated information 
transmission for a Gaussian model of the network of in- 
terest. As discussed in Section UH such a model is also 
able to provide a lower bound on the channel capacity 
for non- Gaussian systems, provided that the Gaussian 
model is chosen appropriately. Even for the linear sys- 
tems considered in Section Hill the calculated results are 
strictly only lower bounds on the channel capacity of real 
biochemical systems. It is known that for many such sys- 
tems, particularly for copy numbers of order a few hun- 
dred molecules which are often found in signalling net- 
works, th e appro ximation of small Gaussian noise is very 
accurate [17|, |27|, |32|, |33J . However, in most cases it is not 
clear what the typical input distributions of biochemi- 
cal networks in natural environments are. It is therefore 
difficult to quantify the impact of assuming a Gaussian 
input distribution in this analysis. We hope that future 
experiments will clarify the typical distributions of envi- 
ronmental stimuli to which cells are exposed, and which 
are crucial in determining how information is propagated 
through networks. 
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Appendix A: Comparison of signal and noise 
partitioning for motifs II and III 

Here we consider in more detail the partitioning of the 
output power P xx (lo) into signal and noise contributions 
for transmission from Q to X and from S to X. From 
the Langevin equations in Eq. 1461 where we have chosen 
A = in motif II and A = p in motif III, the output power 
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spectrum of both motifs can be written as 



Pxx(") = -3-; — j 2" 2 P wH + 



cj 2 + /i 2 cj 2 + p 2 



2p 2 



Ul 2 + /i 2 CJ 2 + p 2 IjS 1 + /) 2 



(Al) 



where the T a p factors describe the various noise strengths 
and correlations and are defined by {r] a (t)rip{t')) = 
T a p8(t — t'). We note that while we do not specify a 
particular process or power spectrum for q(t), we have 
assumed that the noise in S is uncorrelated from Q: 
(q(t)rj s (t')) = for all t and t' . The first term in Eq. 
IA1I describes the influence of q(t) on x(t) in the absence 
of noise, and characterises the mean response of x(t) to 
changes in q(t). The two prefactors to P qq (u>) represent 
the effective response functions at the two levels on the 
cascade, the transmission of Q to S and of S to X. These 
transfer functions show that high-frequency signals are 
attenuated at each step. The second term in Eq. IA1I 
includes intrinsic noise in the production and decay of 
S molecules. The third term similarly describes intrinsic 
noise in the production and decay of X. For the reactions 
and parameters we are considering, these three terms are 
the same for both motifs, with T ss = T xx = 2a(Q). 
The final term contains corrections to the two previous 
noise terms due to correlations between r] s (t) and rj x {t). 
For motif III we have T sx — 0. However, for motif II 
T sx — —a(Q), and this term precisely cancels the intrin- 
sic noise in S. This latter noise, the second term in Eq. 
IAU contains contributions from the production reactions 
of S and from the decay reactions. Since this decay oc- 
curs via the reaction S — > X in motif II, these events 
are also included in the term describing intrinsic noise 
in the production of X, the third term in Eq. IA1I The 
negative cross-correlation term therefore eliminates the 
double-counting of these events. Furthermore, the cross- 
correlation term also removes fluctuations in X due to 
noise in the production of S. This noise does not prop- 
agate to X in motif II, because in the regime of small 
fluctuations around steady state the spontaneous pro- 
duction and decay of S molecules are two independent 
Poisson processes; hence noise in the production of S is 
uncorrelated from the production of X . This is in con- 
trast to motif III, for which production of X molecules 
can occur whenever an S is present, and hence the signal 
x(t) depends on the noise in both the production and 
decay of S. Ultimately, for the two motifs we have 
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If we consider the transmission of Q to X, then only 
fluctuations in x(t) which are correlated with q(t) should 
contribute to the signal component of the output power. 
It also follows from the definition in Eq. [TBlthat £ 9 ^ x (cj) 
is the same for both motifs, 



uj' z + p 2 UJ 2 + p 



;P qq (u). (A4) 



The remaining terms in Eq. IA1I form the noise power 
spectrum. For motif II the noise in the output signal is 



H UJ Z + p z 

while for motif III we have 
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We can therefore see that the total noise is smaller in 
motif II than in motif III because, as noted above, noise 
in the production of S does not propagate to X. How- 
ever, this difference becomes negligible at high frequen- 
cies ui » p because at these frequencies motif III effec- 
tively averages over rapid fluctuations in S and hence the 
effect of these fluctuations on X diminishes. 

Now suppose that we take s(t), rather than q(t), to be 
the input signal to the network. Then the signal com- 
ponent of the output power should include those terms 
for which X is correlated with S. For motif III we can 
straightforwardly see that the second term in Eq. IAU 
which describes fluctuations in x{t) due to intrinsic noise 
in S, should contribute to the signal power. The intrinsic 
fluctuations in X remain uncorrelated from S, and hence 
are still considered noise. We therefore have 
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We can also identify the bracketed terms in Eq. IA7I as 
the power spectrum of S. If we consider motif II, while 
P xx (ui) does not depend on rj s (t) explicitly we must recall 
that noise in the production of X molecules corresponds 
to noise in the decay of S, and hence contributes to the 
signal power. Applying the definition of the signal power, 
Eq. [T6l leads to the following result: 
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± xx (A3) The noise in motif II is again lower than in motif III at 
uj' z + fi z to 2 + p A uj 2 + [i 2 a n frequencies. In this case, this difference arises because 
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in motif II noise in the production of X is correlated with 
s(t). To study the differences between the signal powers 
of the two motifs we consider 



P 



A(a 2 P qq (iu)+T ss ) w 2 + p 2 



(All) 



where we have used the facts that T ss is the same for 
both motifs and that for motif II T sx = —T ss /2. We can 
see that at low frequencies u) -C p, AE(oj) is negative, 



showing that the signal power is larger in motif III than 
in motif II. This reflects amplification of low-frequency 
noise in s(t) by motif III. At high frequencies motif III is 
no longer able to amplify noise, and instead at the level 
of X averages over these fluctuations. In this regime 
AS(w) becomes positive, showing that the signal power 
is larger in motif II than in motif III. This transition 
results in the cross-over observed in Fig. HJa) between 
the different frequency regimes in which motif II or III 
can more reliably transmit signals. 



